Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FuXi: A cascade machine learning forecasting system for 15-day global weather forecast (2306.12873v3)

Published 22 Jun 2023 in physics.ao-ph, cs.AI, and cs.LG

Abstract: Over the past few years, due to the rapid development of ML models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0.25 degree. However, the challenge remains to perform comparably to the ECMWF ensemble mean (EM) in 15-day forecasts. Previous studies have demonstrated the importance of mitigating the accumulation of forecast errors for effective long-term forecasts. Despite numerous efforts to reduce accumulation errors, including autoregressive multi-time step loss, using a single model is found to be insufficient to achieve optimal performance in both short and long lead times. Therefore, we present FuXi, a cascaded ML weather forecasting system that provides 15-day global forecasts with a temporal resolution of 6 hours and a spatial resolution of 0.25 degree. FuXi is developed using 39 years of the ECMWF ERA5 reanalysis dataset. The performance evaluation, based on latitude-weighted root mean square error (RMSE) and anomaly correlation coefficient (ACC), demonstrates that FuXi has comparable forecast performance to ECMWF EM in 15-day forecasts, making FuXi the first ML-based weather forecasting system to accomplish this achievement.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Magnusson, L., et al.: ECMWF activities for improved hurricane forecasts. Bull. Am. Meteorol. Soc. 100(3), 445–458 (2019) (3) Balsamo, G., et al.: Recent progress and outlook for the ECMWF integrated forecasting system. EGU23 (EGU23-13110) (2023) (4) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Balsamo, G., et al.: Recent progress and outlook for the ECMWF integrated forecasting system. EGU23 (EGU23-13110) (2023) (4) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  2. Balsamo, G., et al.: Recent progress and outlook for the ECMWF integrated forecasting system. EGU23 (EGU23-13110) (2023) (4) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  3. Schultz, M.G., et al.: Can deep learning beat numerical weather prediction? Philos. Trans. Royal Soc. A PHILOS T R SOC A 379(2194), 20200097 (2021) (5) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  4. Rasp, S., et al.: Weatherbench: a benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst. 12(11), 2020–002203 (2020) (6) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  5. Garg, S., Rasp, S., Thuerey, N.: Weatherbench probability: A benchmark dataset for probabilistic medium-range weather forecasting along with deep learning baseline models. Preprint at https://arxiv.org/abs/2205.00865 (2022) (7) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  6. Hersbach, H., et al.: The era5 global reanalysis. Q. J. R. Meteorol. Soc. 146(730), 1999–2049 (2020) (8) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  7. Rasp, S., Thuerey, N.: Data-driven medium-range weather prediction with a resnet pretrained on climate simulations: A new model for weatherbench. J. Adv. Model. Earth Syst. 13(2), 2020–002405 (2021) (9) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  8. Weyn, J.A., Durran, D.R., Caruana, R.: Improving data-driven global weather prediction using deep convolutional neural networks on a cubed sphere. J. Adv. Model. Earth Syst. 12(9), 2020–002109 (2020) (10) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  9. Hu, Y., Chen, L., Wang, Z., Li, H.: SwinVRNN: A data-driven ensemble forecasting model via learned distribution perturbation. J. Adv. Model. Earth Syst. 15(2), 2022–003211 (2023) (11) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016) (12) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  11. Pathak, J., et al.: Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. Preprint at https://arxiv.org/abs/2202.11214 (2022) (13) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  12. Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A., Catanzaro, B.: Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers. Preprint at https://arxiv.org/abs/2111.13587 (2022) (14) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  13. Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021) (15) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  14. Chen, L., Du, F., Hu, Y., Wang, F., Wang, Z.: SwinRDM: Integrate SwinRNN with Diffusion Model Towards High-Resolution and High-Quality Weather Forecasting. (2023). Preprint at https://doi.org/10.48448/zn7f-fc64 (16) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  15. Bi, K., et al.: Accurate medium-range global weather forecasting with 3d neural networks. Nature (2023) (17) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  16. Lam, R., et al.: GraphCast: Learning skillful medium-range global weather forecasting. Preprint at https://arxiv.org/abs/2212.12794 (2022) (18) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  17. Dueben, P.D., Bauer, P.: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 11(10), 3999–4009 (2018) (19) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  18. Chen, K., et al.: FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead. Preprint at https://arxiv.org/abs/2304.02948 (2023) (20) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  19. Ho, J., et al.: Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res. 23(47), 1–33 (2022) (21) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  20. Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015) (22) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  21. Tong, Z., Song, Y., Wang, J., Wang, L.: Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In: Advances in Neural Information Processing Systems, vol. 35, pp. 10078–10093 (2022) (23) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  22. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer Normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) (24) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  23. Liu, Z., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) (25) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  24. Liu, Z., et al.: Swin transformer v2: Scaling up capacity and resolution. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11999–12009 (2022) (26) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  25. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241 (2015). Springer (27) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) (28) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  27. Wu, Y., He, K.: Group Normalization. Preprint at https://arxiv.org/abs/1803.08494 (2018) (29) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  28. Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks 107, 3–11 (2018). Special issue on deep reinforcement learning (30) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  29. Ramachandran, P., Zoph, B., Le, Q.V.: Searching for Activation Functions. Preprint at https://arxiv.org/abs/1710.05941 (2017) (31) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  30. Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010) (32) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  31. Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017) (33) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  32. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017) (34) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  33. Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization. Preprint at https://arxiv.org/abs/1711.05101 (2017) (35) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  34. Larsson, G., Maire, M., Shakhnarovich, G.: Fractalnet: Ultra-deep neural networks without residuals. In: International Conference on Learning Representations (2017) (36) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  35. Zhao, Y., et al.: PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel. Preprint at https://arxiv.org/abs/2304.11277 (2023) (37) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  36. Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training Deep Nets with Sublinear Memory Cost. Preprint at https://arxiv.org/abs/1604.06174 (2016) (38) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  37. Lorenz, E.N.: Deterministic Nonperiodic Flow. J. Atmos. Sci. 20(2), 130–148 (1963) (39) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  38. Buizza, R., Milleer, M., Palmer, T.N.: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system. Q. J. R. Meteorol. Soc. 125(560), 2887–2908 (1999) (40) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  39. Leutbecher, M., Palmer, T.N.: Ensemble forecasting. J. Comput. Phys. 227(7), 3515–3539 (2008) (41) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  40. Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Proceedings of The 33rd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 48, pp. 1050–1059. PMLR, New York, New York, USA (2016) (42) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  41. Hersbach, H.: Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15(5), 559–570 (2000) (43) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  42. Sloughter, J.M., Gneiting, T., Raftery, A.E.: Probabilistic wind speed forecasting using ensembles and bayesian model averaging. J. Am. Stat. Assoc. 105(489), 25–35 (2010) (44) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  43. Wilks, D.S.: Statistical Methods in the Atmospheric Sciences vol. 100, 3rd edn. (2011) (45) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  44. Fortin, V., Abaza, M., Anctil, F., Turcotte, R.: Why should ensemble spread match the rmse of the ensemble mean? J. Hydrometeorol. 15(4), 1708–1713 (2014) (46) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  45. Magnusson, L., Nycander, J., Källén, E.: Flow-dependent versus flow-independent initial perturbations for ensemble prediction. Tellus A 61(2), 194–209 (2009) (47) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  46. Du, J., Zheng, F., Zhang, H., Zhu, J.: A multivariate balanced initial ensemble generation approach for an atmospheric general circulation model. Water 13(2), 122 (2021) (48) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  47. Vitart, F., Robertson, A.W., Anderson, D.: Subseasonal to seasonal prediction project: bridging the gap between weather and climate. npj Clim. Atmos. Sci. 1(3) (2018) (49) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  48. Robertson, A.W., Vitart, F., Camargo, S.J.: Subseasonal to seasonal prediction of weather to climate with application to tropical cyclones. J. Geophys. Res. Atmos. 125(6), 2018–029375 (2020) (50) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023) Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
  49. Chen, L., et al.: Fuxi: A cascade machine learning forecasting system for 15-day global weather forecast (Version 1.0) [Dataset] [Software]. Zenodo. https://doi.org/10.5281/zenodo.8100201 (2023)
Citations (151)

Summary

We haven't generated a summary for this paper yet.