HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting (2401.05012v2)

Published 10 Jan 2024 in cs.LG

Abstract: Time series forecasting is a critical and challenging task in practical application. Recent advancements in pre-trained foundation models for time series forecasting have gained significant interest. However, current methods often overlook the multi-scale nature of time series, which is essential for accurate forecasting. To address this, we propose HiMTM, a hierarchical multi-scale masked time series modeling with self-distillation for long-term forecasting. HiMTM integrates four key components: (1) hierarchical multi-scale transformer (HMT) to capture temporal information at different scales; (2) decoupled encoder-decoder (DED) that directs the encoder towards feature extraction while the decoder focuses on pretext tasks; (3) hierarchical self-distillation (HSD) for multi-stage feature-level supervision signals during pre-training; and (4) cross-scale attention fine-tuning (CSA-FT) to capture dependencies between different scales for downstream tasks. These components collectively enhance multi-scale feature extraction in masked time series modeling, improving forecasting accuracy. Extensive experiments on seven mainstream datasets show that HiMTM surpasses state-of-the-art self-supervised and end-to-end learning methods by a considerable margin of 3.16-68.54\%. Additionally, HiMTM outperforms the latest robust self-supervised learning method, PatchTST, in cross-domain forecasting by a significant margin of 2.3\%. The effectiveness of HiMTM is further demonstrated through its application in natural gas demand forecasting.

References (43)

Authors (7)

Shubao Zhao (4 papers)
Ming Jin (130 papers)
Zhaoxiang Hou (6 papers)
Chengyi Yang (11 papers)
Zengxiang Li (17 papers)
Qingsong Wen (139 papers)
Yi Wang (1038 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting (2401.05012v2)

Summary

Related Papers

Tweets